Combine Person Name and Person Identity Recognition and Document Clustering for Chinese Person Name Disambiguation
نویسندگان
چکیده
This paper presents the HITSZ_CITYU system in the CIPS-SIGHAN bakeoff 2010 Task 3, Chinese person name disambiguation. This system incorporates person name string recognition, person identity string recognition and an agglomerative hierarchical clustering for grouping the documents to each identical person. Firstly, for the given name index string, three segmentors are applied to segment the sentences having the index string into Chinese words, respectively. Their outputs are compared and analyzed. An unsupervised clustering is applied here to help the personal name recognition. The document set is then divided into subsets according to each recognized person name string. Next, the system identifies/extracts the person identity string from the sentences based on lexicon and heuristic rules. By incorporating the recognized person identity string, person name, organization name and contextual content words as features, an agglomerative hierarchical clustering is applied to group the similar documents in the document subsets to obtain the final person name disambiguation results. Evaluations show that the proposed system, which incorporates extraction and clustering technique, achieves encouraging recall and good overall performance.
منابع مشابه
Jumping Distance based Chinese Person Name Disambiguation
In this paper, we describe a Chinese person name disambiguation system for news articles and report the results obtained on the data set of the CLP 2010 Bakeoff-3. The main task of the Bakeoff is to identify different persons from the news stories that contain the same person-name string. Compared to the traditional methods, two additional features are used in our system: 1) n-grams co-occurred...
متن کاملChinese Personal Name Disambiguation Based on Person Modeling
This document presents the bakeoff results of Chinese personal name in the First CIPS-SIGHAN Joint Conference on Chinese Language Processing. The authors introduce the frame of person disambiguation system LJPD, which uses a new person model. LJPD was built in short time, and it is not given enough training and adjustment. Evaluation on LJPD shows that the precision is competitive, but the reca...
متن کاملResolving Person Names in Web People Search
Disambiguating person names in a set of documents (such as a set of web pages returned in response to a person name) is a key task for the presentation of results and the automatic profiling of experts. With largely unstructured documents and an unknown number of people with the same name the problem presents many difficulties and challenges. This chapter treats the task of person name disambig...
متن کاملClustering web people search results using fuzzy ants
Person name queries often bring up web pages that correspond to individuals sharing the same name. The Web People Search (WePS) task consists of organizing search results for ambiguous person name queries into meaningful clusters, with each cluster referring to one individual. This paper presents a fuzzy ant based clustering approach for this multi-document person name disambiguation problem. T...
متن کاملWhich Who are They? People Attribute Extraction and Disambiguation in Web Search Results∗
People name search often returns a lot of Web pages containing the strings of personal names. Due to namesake, extracting target person attributes (such as birthday, occupation, affiliation, nationality, contact information, etc.) is expected to be helpful to differentiate documents related to different people and thus group documents related to the same person. This paper presents the methodol...
متن کامل